Superconducting circuit for high-speed lookup table

ABSTRACT

A high-speed lookup table is designed using Rapid Single Flux Quantum (RSFQ) logic elements and fabricated using superconducting integrated circuits. The lookup table is composed of an address decoder and a programmable read-only memory array (PROM). The memory array has rapid parallel pipelined readout and slower serial reprogramming of memory contents. The memory cells are constructed using standard non-destructive reset-set flip-flops (RSN cells) and data flip-flops (DFF cells). An n-bit address decoder is implemented in the same technology and closely integrated with the memory array to achieve high-speed operation as a lookup table. The circuit architecture is scalable to large two-dimensional data arrays.

RELATED APPLICATIONS

The present application is a continuation of U.S. patent applicationSer. No. 12/258,682, filed Oct. 27, 2008, now U.S. Pat. No. 7,903,456,which is a continuation of U.S. patent application Ser. No. 11/360,749,filed Feb. 23, 2006, now U.S. Pat. No. 7,443,719, the entirety of whichare expressly incorporated herein by reference.

GOVERNMENT CONTRACT

Research leading to this invention supported in part by US Army ContractW15P7T-04-C-K417; and US Navy Contract N00039-04-C-2134

FIELD OF THE INVENTION

This invention relates to superconducting integrated circuits,specifically the development of a fast superconducting lookup-tablememory array, which may be applied to ultrafast digital signalprocessing.

BACKGROUND OF THE INVENTION

Ultrafast superconducting digital circuits are based on Josephsonjunctions integrated together according to RSFQ Logic(Rapid-single-flux-quantum), as originally developed and described by K.K. Likharev and V. K. Semenov (1991). Fast memory circuits in the sametechnology are also required for most non-trivial digital applications.One class of memory arrays are random-access memories, or RAM, which areparticularly important for digital computing applications. Suchapplications require equally fast data writing and data retrieval. Thisis in contrast to many digital signal processing applications, in whichthe memory contents need to be read out quickly, but updated onlyrarely, requiring a programmable read-only memory (PROM) instead of aRAM. A particular application of interest is a circuit for real-timedigital predistortion of radio-frequency (RF) signals, where thepredistortion parameters would be maintained in a digital lookup table.

There have been several circuits proposed for superconductor RAM, suchas the Ballistic RAM circuit invented by Hen (U.S. Pat. No. 6,836,141).However, such a circuit does not help one design a fast PROM, which hasan architecture that is completely different. There have been no priorpublications or patents describing a PROM-type RSFQ memory array orlookup table.

The article by Bunyk et al., entitled RFSQ Microprocessor: New DesignApproaches in IEEE Transactions on Applied Superconductivity, Vol. 7,No. 2, June 1997, pp 2697-2704 utilizes an RFSQ Data Processing pipelinearchitecture, similar but distinct from that used in the code matchingnetwork of the present invention.

SUMMARY OF THE INVENTION

A digital lookup table takes a digital input X and provides a digitaloutput Y such that Y=F(X), for any function F that is programmed intothe memory array. The output values for each input value are stored inmemory, and are recalled as needed. The circuit of the present inventioncomprises an address decoder and a programmable read-only memory array(PROM). The memory array has rapid parallel pipelined readout and slowerserial reprogramming of memory contents. The memory cells areconstructed using standard RSFQ elements, the non-destructive reset-setflip-flops (RSN cells) and data flip-flops (DFF cells). An n-bit addressdecoder is implemented in the same technology and closely integratedwith the memory array to achieve high-speed operation as a lookup table.A prototype section of a lookup table based on the invention, with a3.times.4 address decoder and a 4×3 memory matrix, has been designed andfabricated on a 5 mm.times.5 mm niobium superconducting integratedcircuit, for target operation at a clock frequency of 20 GHz which canresult in a target rate of 20 G words/sec. The circuit architecture isscalable to large two-dimensional data arrays.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a lookup table in accordance with one aspect of theinvention.

FIG. 2 is a block diagram showing the components of the lookup table ofFIG. 1.

FIG. 3 is a block diagram of a first code matching network implementedusing a D flip-flop with complementary outputs in accordance with oneaspect of the invention.

FIG. 4A shows a block symbol for a D flip-flop with complementaryoutput.

FIG. 4B shows a column of the first code matching network implementedusing a D flip-flop with complementary output.

FIG. 5 is a block diagram of a second code matching network inaccordance with one aspect of the invention.

FIGS. 6A and 6B show a block symbol for a D flip-flop (withoutcomplementary output) and a block symbol for an inverter, respectively.

FIG. 6C shows an alternative column of an address decoder implementedusing D flip-flops and inverters.

FIG. 7A shows signaling logic used when a code word match results in alogic 0 output.

FIG. 7B shows signaling logic used when a code word match results in alogic 1 output.

FIG. 8A is a block diagram showing the memory portion of FIG. 2 in moredetail.

FIGS. 8B and 8C show a schematic and a block symbol, respectively, of anRS flip-flop circuit with non-destructive read out (NDRO).

FIG. 8D shows an exemplary architecture of memory cells shown in FIG.8A.

FIG. 9 illustrates pipeline operation of the lookup table in accordancewith one aspect of the invention.

FIGS. 10A and 10B show exemplary layouts of a D flip-flop and aninverter, respectively.

FIG. 11 shows a layout of a portion of an address decoder implementedusing D flip-flops and inverters.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 shows a lookup table in accordance with one aspect of theinvention. The lookup table, 100, accommodates a set of input lines, thestates of which are labeled X_(i). Similarly, the output of the lookuptable comprises a plurality of output lines, the output states of whichare labeled Y_(j). For each input state, represented by the vector X anoutput state is produced which is represented by the vector Y. Thecontents of the lookup table are programmed so as to produce arelationship Y=ƒ(X). As a result, the lookup table produces a functionaltransformation of the input states into a corresponding set of outputstates.

FIG. 2 is a block diagram showing the components of the lookup table ofFIG. 1. The lookup table in accordance with the invention comprises anaddress decoder 210, comprising a code matching network 211 andsignaling logic 212, coupled to a programmable read only memory array220. The states of the input vector X are applied to the address decoder210 and the address decoder 210 selects, using the programmable readonly memory, the output states to be retrieved from the memory andapplied to the output lines, resulting in the output state Y. Theprogramming of the programmable read only memory is applied through theline labeled “serial programming”.

FIG. 3 is a block diagram of a first code matching network implementedusing D flip-flops with complementary outputs. Each row of D flip-flopscorresponds to a particular digital word and is activated by thatdigital word. For example, the third row shown in FIG. 3 is programmedto respond (or not respond) to the digital word 1001. The complementaryoutput is connected to the line A_(i) for the first and last cells ofthe row whereas the regular output is connected to line A_(i) for themiddle two cells of the row. One can see that the input lines D_(i) atthe top of the address decoder in FIG. 6 provide a digital word to beloaded to the first row A₁, where that row A₁ will either respond or notrespond to the states of the digital word to produce an output A₁. Inthe instance shown in FIG. 3, when the digital word 1001 is applied tothe inputs D₁ . . . D₄, and propagated down to the row A_(i), the logicone state in column D₁ will result in a null output, the logic zerostate in column D₂ will result in a zero output, the logic zero state inthe third column will result in a zero output and finally, the logic onestate applied in column D₄ will be changed to a zero so that the stateof all outputs on line A_(i) is zero. Every other line A_(k), where _(k)is not equal to _(i), will have at least one cell producing a logicalone on the output of that line, resulting in all ones output with theexception of line A_(i) which will have a zero output.

The contents of the flip-flops of row A₁ are passed in pipe line fashionto the corresponding flip-flops in row A₂ and then to row A₃ and thendown the remainder of the rows to the last row A_(N). Thus, a digitalword input on the input lines D_(i) propagates in pipeline fashionthrough the address decoder. One should note that, in the example shown,the selection of regular and complementary outputs for each flip-flop ina row is set to produce no output on the output line A_(i) when thedigital word for which it is programmed is applied to that row of Dflip-flops.

FIG. 4A shows a block symbol for a D flip-flop with complementaryoutput.

FIG. 4B shows another example of a column of an address decoderimplemented using a D flip-flop with complementary output. This examplecorresponds to column D₄ of FIG. 3. When the output of a particular rowof the code matching network is designed to produce logic 0 when thedigital word for which a row is hardwired is applied to that row,signaling logic can be used to change the logic 0 state to a logic 1output. In FIG. 4B, the signaling logic is comprised of a column ofinverters.

The address decoder, works on the premise of code matching. When theinput address finds its match in a row of the code-matching part, thesignaling logic for that row sends a “Read” pulse to the memory shown inFIGS. 3 and 4B. This scheme uses identical cells for the entirecode-matching part of the decoder and is logically simple. In eachcolumn, the true (Q) output of each DFFC is connected to the data input(D) of the next DFFC below it. The output value is determined bychoosing and hard-wiring either the true output or the complementaryoutput to the output line.

FIG. 5 is a block diagram of a second code matching network inaccordance with one aspect of the invention. Unlike the first codematching network shown in FIG. 3, which is comprised exclusively of Dflip-flops with complementary outputs, this alternate arrangement takesnote of the fact that in the version of FIG. 3, only a direct orinverted output is utilized. This permits an alternative code matchingnetwork to be constructed utilizing only D flip-flops (withoutcomplementary outputs) and inverters. In the version shown in FIG. 5,each row is programmed to produce an output when the code word thatreaches that row matches the word shown at the extreme left of eachoutput line A_(i) of FIG. 5. Each bit of the incoming digital word,D_(i), propagates down a respective column D₁-D₄. As it propagates downthe column, it is either propagated unchanged by a D flip-flop or isinverted by an inverter. Although a number of arrangements are possiblefor the digital words, the programming shown in FIG. 5 is designed tomap to a digital counting sequence from 0-15. Based on the binary valueof the incoming digital word (e.g. 0101) the output for that digitalword will appear, in the case of the example, on line A₅. Thus, the codematching network can be implemented using only D flip-flips andinverters without the need for complementary outputs on the Dflip-flops.

FIGS. 6A and 6B show a block symbol for a D flip-flop (withoutcomplementary output) and a block symbol for an inverter, respectively.

FIG. 6C shows a column of an address decoder implemented using Dflip-flops and inverters. In certain embodiments, it is desirable toavoid using D flip-flops with complementary outputs in favor ofselectively arranging D flip-flops (without complementary outputs)together with inverters to produce the desired logical patterns as shownin the example shown in FIG. 6C.

Each cell in the code-matching matrix performs two functions: (1) itproduces an output to the signaling logic part, and (2) it allowssynchronous data-flow down the column to the cell in the next row.Recognizing that as far as the output signaling is concerned, each DFFC,in this hard-wired configuration, works either as a DFF or as a NOT(clocked inverter) but never both, one can simplify the circuitcomplexity by choosing only one of them for each cell. Logically, thisscheme is more complex because one has to account for inversions in thedata flow-down path. One can do this by configuring the code-matchingmatrix column-by-column by placing a NOT cell to change the value(0-to-1 and 1-to-0) and a DFF cell when no change is needed. One columnof such an arrangement (also corresponding to column D₄ of FIG. 3) isshown in FIG. 6C.

It is possible to design the code matching network to produce a logic 1when a match occurs. Different signaling logic is utilized when the codematching network is designed to produce a 0 output on a line when a codematch occurs from the situation when a logic 1 is produced on an outputline by the code match network.

FIG. 7A shows signaling logic used when a code word match results in alogic 0 output.

FIG. 7B shows signaling logic used when a code word match results in alogic 1 output. With logic 1 outputs, provision should be made to ensurethat the output pulses for a row are not coincident, so they can beindividually counted. This can be accomplished, for example, byintroducing differential delays in the pulses traversing differentcolumns of the code matching network.

FIG. 8A is a block diagram showing the memory portion of FIG. 2 in moredetail. In the example shown in this figure, the input applied to linesD₁-D_(n) is an n-bit signal. Such a signal might be generated, forexample, when an incoming RF signal is oversampled and a digital valueof each sample is applied sequentially to the n-bit address decoder. Thestates of the n-bit signal comprise the input X, discussed previously.The n-bit address decoder has a number of output lines, onecorresponding to each state of the input vector X. Thus, for binarysignals, the number of output lines of the n-bit address decoder is N=2^(n). Each output line labeled A_(k) feeds a row of the memory arrayshown in FIG. 8A. There is an output line A_(k) for each state of theinput lines D_(i) that is, each different state of the input linesactivates a selected output line A_(k) which then activates a row of thememory array.

To the upper left of the memory array shown in FIG. 8A, is a “SerialWrite” input. That Serial Write input line is utilized to load thecontents of the non-destructive read out cells of the memory. One cansee by following the Serial Write line through the memory array that thecells are loaded in a serial fashion by sequentially clocking the inputdata through the array row by row, with some rows being loaded left toright and others being loaded right to left. Although the memory arrayis pre-loaded serially with the desired output of the lookup table, theread out is accomplished in parallel fashion as discussed morehereinafter. A write cycle at 1 Gbit/s would be completed in about halfa microsecond for a memory with 64 8-bit words.

FIGS. 8B and 8C show a schematic and a block symbol of an RS flip-flopwith non-destructive read out (NDRO), respectively. The block symbolutilized in the depiction of the memory cells is shown in FIG. 8C. Theschematic for the RSN circuit is shown in FIG. 8A. It is comprised ofJosephson junctions, indicated by the symbol X. the operation of NDROcells is described in the article by Likharev and Semenov referencedpreviously.

FIG. 8D shows an exemplary architecture of memory cells of FIG. 8A. Eachmemory cell is comprised of an RS flip-flip with non-destructive readout (NDRO), labeled hereinafter RSN, plus a type D flip-flop labeled D.Each row of the memory cells is accessed by activating its respectiveline A_(i) which causes the NDRO cell RSN to transfer its contents tothe type D flip-flop D. The entire row is read out at one time and sothe contents of the entire row of RSN's is transferred to thecorresponding type D flip-flops for that row.

The type D flip-flops of a given row then transfer (vertically as shown)their contents to the next type D flip-flop in the column which thentransfers its contents to the next type D flip-flop in the column and soon down to the output of the final type D flip-flop for a column whichis applied to an output bus.

As discussed in conjunction with FIG. 8A, the contents of the memorycells are loaded sequentially. This is illustrated in FIG. 8D by theserial write input providing in a serial fashion the contents for eachof the RSN portions of the memory cells. As shown, each of the RSNmemory cells is linked in a serial fashion for writing of the contentsof those memory cells via the Serial Write input.

FIG. 9 illustrates the pipeline operation of the lookup table inaccordance with one aspect of the invention. As mentioned above, then-bit input signal is applied to the input lines D_(i) of the codematching network. Each digital word received at input lines D_(i) ispropagated in pipeline fashion down the columns of the code matchingnetwork where the decoding operation, previously described occurs. Theoutput state of a particular line A_(i), depends on whether or not thedigital word currently resident in the row of the address decodercorresponds to the digital word for which that row has been programmedto respond. If the digital word does correspond, the output online A_(i)is zero, in the embodiment shown, which is inverted by signaling logic(inverter) to the right of line A_(i) to logical 1. As a result, eachdigital word in the address decoder will either produce (1) a logicalzero on output line A_(i) to which it corresponds or (2) a logical 1.The logical zero output on line A_(i) will occur when the digital wordin that row of D flip-flops with complementary outputs matches theoutputs set for which that row is programmed. If it does match, theinverter will change the output state from logic zero to logic 1 therebyactivating a transfer of the contents of the NDRO cells of row A_(i) ofthe Programmable Read Only Memory to the output flip-flops forpropagation down to the output bus.

The code matching elements shown in FIG. 9 are inverters as shown inFIG. 7A. This is convenient when the output line A_(i) has a zero whenthe code word matches the logical states for which that line isprogrammed. However, in some situations, it is desirable that instead ofproducing an output zero on a particular line A_(i) of the addressdecoder, it is desirable, instead, to produce a logical one. This isillustrated in FIG. 7B.

As shown in FIG. 7B, as the logical one output from each of the DFFC'swith complementary outputs is applied to line A_(i), a counter, in thiscase modulo (n=4) is incremented. When the last pulse is received, acarry output will trigger the activation of the memory row R_(i). Thecounters may need to be periodically reset.

Both the input n-bit words and the output from the rows of the memoryarray operate in pipeline fashion. Specifically, with each clock cycle,digital words originally input on the input lines of the address decoderD_(i) are propagated sequentially through the rows of the addressdecoder in a continuous fashion. Similarly, the outputs of rows of thememory array which are selected by the output lines A_(i) of the addressdecoder are propagated in sequential fashion down the columns of Dflip-flops until they reach the output bus which serves as the output ofthe lookup table.

Note that a digital word input at the input of the address decoder maytake several clock cycles before it finds a match in the address decoderwhich will trigger then the output of the corresponding row of thememory array. When it does, the output from the memory cells of that roware then applied to the D flip-flops and continue down in pipelinefashion to the output bus. As a result, each digital word applied to theinput lines D_(i) as it traverses the address decoder in pipelinefashion will activate one of the output lines which will result intransfer of the contents of a row of the memory array into thecorresponding D flip-flops for passing down the memory array pipeline tothe output bus. Even though the output for a particular digital wordmight actually be selected subsequent to selection of a differentdigital word, the overall ordering of the output words on the output buswill be strictly in sequential order corresponding to the input order ofthe n-bit digital words applied to the input. With 4-bit input numbers(0 to 15), the total throughput delay in all cases will be 18 clockcycles (τ).

FIGS. 10A and 10B show exemplary layouts of a D flip-flop and inverter,respectively.

FIG. 11 shows a layout of a portion of an address decoder implementedusing D flip-flops and inverters.

While various embodiments of the present invention have been illustratedherein in detail, it should be apparent that modifications andadaptations to those embodiments may occur to those skilled in the artwithout departing from the scope of the present invention as set forthin the following claims.

1. A pipelined multi-bit processor, comprising: an input port configuredto receive a multibit digital value; a processing network comprising apipeline of successive processing stages employing superconductingelements, wherein the digital value is transformed in dependence onpipeline logic and passed to succeeding stages of the pipeline independence on a clock cycle; and at least one output port, configured topresent an output in dependence on the received digital value, the clockcycle, the pipeline logic and a respective stage of the pipeline withwhich the output port is associated.
 2. The pipelined multi-bitprocessor according to claim 1, wherein the pipeline logic implements acode matching network, configured to generate a plurality of outputsignals corresponding to the multibit digital value, further comprisinga plurality of code-matching cells organized into rows and columns, eachcell comprising a clocked rapid-single-flux-quantum device, having acolumn associated with each bit of the multibit digital value, and a rowassociated a respective output port, wherein the pipeline logic isconfigured to provide a variable time delay between a receipt of themultibit digital value at the input port, and presentation of the outputat the output port, of an integral number of clock cycles, a value ofthe integral number varying in dependence on the multibit digital value.3. The pipelined multi-bit processor according to claim 2, furthercomprising a pipelined memory array, configured to receive the outputfrom the output port, and to produce an memory output representing amemory contents of at least one memory cell at an address of thepipelined memory array defined by the multibit digital input, the memoryoutput having a total time delay between receipt of the multibit digitalvalue and production of the memory output of an integral number of clockcycles that is independent of the multibit digital value.
 4. Thepipelined multi-bit processor according to claim 2, wherein eachcode-matching cell comprises a clocked data flip-flop having a regularoutput and complementary output, wherein either the regular output orthe complementary output of the data flip-flop is connected to arespective network output line, depending on a bit value of a respectivebit of the multibit digital value.
 5. The pipelined multi-bit processoraccording to claim 2, wherein each code-matching cell comprises either aclocked data flip-flop or a clocked inverter, whereby the output of thecode-matching cell is connected to the input of the succeeding row, aswell as to a respective network output line.
 6. The pipelined multi-bitprocessor according to claim 1, wherein the transformed multibit digitalvalue is passed to succeeding stages.
 7. The pipelined multi-bitprocessor according to claim 1, wherein the multibit digital value ispassed to succeeding stages in a non-transformed state.
 8. A pipelinedprocessing method, comprising: receiving a multibit digital value;processing the received multibit digital value with a processing networkcomprising a pipeline of successive processing stages employingsuperconducting elements, wherein the digital value is transformed independence on pipeline logic and passed to succeeding stages of thepipeline in dependence on a clock cycle; and presenting an output independence on the received digital value, the clock cycle, the pipelinelogic and a respective stage of the pipeline with which the output portis associated, wherein the pipeline logic implements a code matchingnetwork, generating a plurality of output signals corresponding to themultibit digital value, further comprising a plurality of code-matchingcells organized into rows and columns, each cell comprising a clockedrapid-single-flux-quantum device, having a column associated with eachbit of the multibit digital value, and a row associated a respectiveoutput port, wherein the pipeline logic provides a variable time delaybetween a receipt of the multibit digital value at the input port, andpresentation of the output at the output port, of an integral number ofclock cycles, a value of the integral number varying in dependence onthe multibit digital value, and the ouput port is received by apipelined memory array, which produces an memory output representing amemory contents of at least one memory cell at an address of thepipelined memory array defined by the multibit digital input, the memoryoutput having a total time delay between receipt of the multibit digitalvalue and production of the memory output of an integral number of clockcycles that is independent of the multibit digital value.
 9. The methodaccording to claim 8, wherein each code-matching cell comprises aclocked data flip-flop having a regular output and complementary output,wherein either the regular output or the complementary output of thedata flip-flop is connected to a respective network output line,depending on a bit value of a respective bit of the multibit digitalvalue.
 10. The method according to claim 8, wherein each code-matchingcell comprises either a clocked data flip-flop or a clocked inverter,whereby the output of the code-matching cell is connected to the inputof the succeeding row, as well as to a respective network output line.11. The method according to claim 8, wherein the transformed multibitdigital value is passed to succeeding stages.
 12. The method accordingto claim 8, wherein the multibit digital value is passed to succeedingstages in a non-transformed state.
 13. A pipelined multi-bit processorcomprising a code matching network, configured to generate a pluralityof output signals corresponding to a multibit digital input valuereceived at an input port, comprising a plurality of code-matching cellsorganized into rows and columns, each cell comprising a clockedrapid-single-flux-quantum device, having a row associated with arespective multibit digital value, and columns of the row which togetherdefine an output value, configured to selectively provide a variabletime delay of an integral number of clock cycles between a receipt ofthe multibit digital value at the input port, and presentation of theoutput at an output port, a value of the integral number selectivelyvarying in dependence on the multibit digital value.
 14. The pipelinedmulti-bit processor according to claim 13, further comprising apipelined memory array, configured to receive the output from theoutput, and to produce an memory output representing a memory contentsof at least one memory cell corresponding to the output, the memoryoutput having a total time delay between receipt of the multibit digitalvalue and production of the memory output of an integral number of clockcycles that is independent of the multibit digital value.
 15. Thepipelined multi-bit processor according to claim 13, wherein eachcode-matching cell comprises a clocked data flip-flop having a regularoutput and complementary output, wherein either the regular output orthe complementary output of the data flip-flop is connected to arespective network output line, depending on a bit value of a respectivebit of the multibit digital value.
 16. The pipelined multi-bit processoraccording to claim 13, wherein each code-matching cell comprises eithera clocked data flip-flop or a clocked inverter, whereby the output ofthe code-matching cell is connected to the input of a succeeding row, aswell as to a respective network output line.